Data Mining Using Learning Techniques for Fraud Detection

نویسندگان

  • Pooja Sachdeva
  • Sangeeta Behl
چکیده

Data mining is a combination of database and artificial intelligence technologies. It is a process of identifying and extracting patterns from data, particularly from very large and/or complex sets of data. The major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. Data mining and Machine Learning is a relatively new technique that is proving to be extremely effective in detecting fraud, and it offers insurers new opportunities to reduce losses. Fraud that involves cell phones, insurance claims, tax return claims, credit card transactions etc represent significant problems for governments and businesses, but yet detecting and preventing fraud is not a simple task. Fraud is an adaptive crime, so it needs special methods of intelligent data analysis to detect and prevent it. These methods exist in the areas of Knowledge Discovery in Databases (KDD), Data Mining, Machine Learning and Statistics. Techniques used for fraud detection fall into two primary classes: statistical techniques and artificial intelligence. Example  Data preprocessing techniques for detection, validation, error correction, and filling up of missing or incorrect data.  Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. The main AI techniques used for fraud management include:  Data mining to classify, cluster, and segment the data and automatically find associations and rules in the data that may signify interesting patterns, including those related to fraud.  Machine learning techniques to automatically identify characteristics of fraud This paper presents the detection of fraud through data mining and machine learning techniques. 1.INTRODUCTION ―Data Mining‖, the extraction of hidden predictive information from large databases. Data mining derives its name from the similarities between searching for valuable business information in a large database — for example, finding linked products in gigabytes of store scanner data — and mining a mountain for a vein of valuable ore. Data mining is a combination of database and artificial intelligence technologies. ―Machine Learning‖, the ability of a program to learn from experience — that is, to modify its execution on the basis of newly acquired information. The ability of a machine to improve its performance based on previous results. ―Fraud‖, any act of deception carried out for the purpose of unfair, undeserved and/or unlawful gain.Fraud is an adaptive crime, so it needs special methods of intelligent data analysis to detect and prevent it. Types of fraud  Credit card fraud  Insurance claim fraud  Mobile / cell phone fraud  Insider trading ―Fraud Detection‖, is concerned with the detection of fraud cases from logged data of system and user behavior. Data mining and Machine Learning is a relatively new technique that is proving to be extremely effective in detecting fraud, and it offers insurers new opportunities to reduce losses. 2.COMMON MACHINE LEARNING There are many types of machine learning  ―Supervised Learning‖, in which the data is labeled with the correct answers. The two most common types of supervised learning is Classification and Regression. example of a classification problem is for the computer to learn how to recognize handwritten digit, Supervised learning can also be used in medical diagnoses--for instance, given a set of attributes about potential cancer patients, and whether those patients actually had cancer, the computer could learn how to distinguish between likely cancer patients and possible false alarms.  ―Unsupervised Learning‖, in which we are given a collection of unlabeled data, which we wish to analyze and discover patterns within. e.g. dimension reduction and clustering. The goal is to have computer learn how to do something that we don’t tell it how to do! Clustering can be useful when there is enough data to form clusters (though this turns out to be difficult at times) and especially when additional data about members of a cluster can be used to produce further results due to dependencies in the data.  ―Reinforcement Learning‖, in which an agent seeks to learn the optimal actions to take , based on the past actions e.g. Robot Proceedings of the 5 th National Conference; INDIACom-2011 Copy Right © INDIACom-2011 ISSN 0973-7529 ISBN 978-93-80544-00-7 3.TYPE OF LEARNING FOR FRAUD DETECTION “Anomaly Detection‖, the set of data points that are considerably different than the remainder of the data.Anomaly is a pattern in the data that does not conform to the expected behaviour. Anomaly Detection is a unsupervised method for fraud detection. Applications: Credit card fraud detection, telecommunication fraud detection, network intrusion detection, fault detection. General Steps  Build a profile of the ―normal‖ behavior Profile can be patterns or summary statistics for the overall population  Use the ―normal‖ profile to detect anomalies Anomalies are observations whose characteristics differ significantly from the normal profile Here, in this Example N1, N2, N3, N4 are regions of normal behaviour Points O1, O2, O3, O4, O5 are anomalies 4.TYPES OF ANOMALY DETECTION Graphical & Statistical-based: Calculation of various statistical parameters such as averages, quantiles, performance metrics, probability distributions, and so on. For example, the averages may include average length of call, average number of calls per month and average delays in bill payment. Models and probability distributions of various business activities either in terms of various parameters or probability distributions. Box plot (1-D), Scatter plot (2-D), Spin plot (3-D) are the graphical approach for detecting fraud. Example of Graphical Approach Here the point P1 is different from the other points in the series, it is an Anomaly or Outlier The Major Limitations of The Graphical Approach To detect Fraud are • Time Consuming • Subjective

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

Identification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms

In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...

متن کامل

Identification of Fraud in Banking Data and Financial Institutions Using Classification Algorithms

In recent years, due to the expansion of financial institutions,as well as the popularity of the World Wide Weband e-commerce, a significant increase in the volume offinancial transactions observed. In addition to the increasein turnover, a huge increase in the number of fraud by user’sabnormality is resulting in billions of dollars in lossesover the world. T...

متن کامل

MEFUASN: A Helpful Method to Extract Features using Analyzing Social Network for Fraud Detection

Fraud detection is one of the ways to cope with damages associated with fraudulent activities that have become common due to the rapid development of the Internet and electronic business. There is a need to propose methods to detect fraud accurately and fast. To achieve to accuracy, fraud detection methods need to consider both kind of features, features based on user level and features based o...

متن کامل

Combination of Ensemble Data Mining Methods for Detecting Credit Card Fraud Transactions

As we know, credit cards speed up and make life easier for all citizens and bank customers. They can use it anytime and anyplace according to their personal needs, instantly and quickly and without hassle, without worrying about carrying a lot of cash and more security than having liquidity. Together, these factors make credit cards one of the most popular forms of online banking. This has led ...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011